International Journal of Medical Informatics
Top medRxiv preprints most likely to be published in this journal, ranked by match strength.
Show abstract
Large language models (LLMs) are increasingly explored as tools for healthcare research and data analysis. However, their applicability to structured public health datasets, especially in non-English contexts, remains underexamined. We systematically evaluated 11 state-of-the-art LLMs on their ability to generate executable Python code for analytical queries over Czech public health datasets, focusing on incidence and prevalence data provided by the National Health Information Portal (known as N...
Show abstract
BackgroundRare neuromuscular diseases such as polyneuropathy (PN) and myopathy (MY) often share symptomatic characteristics, leading to diagnostic challenges and delays. Machine learning applied to routine care data of electronic health records (EHRs) offers the potential for accelerating accurate diagnosis. ObjectiveTo develop and evaluate machine learning models to distinguish between patients with PN and MY using EHR data, as a step toward tools that could support improved diagnostic process...
Show abstract
Public health policies increasingly rely on the use of complex and large datasets containing heterogeneous, multimodal data that require advanced analytical methods to extract meaningful insights and support evidence-based decision-making. Essential for the sharing and analysis of public health data is the description of the data ("data about data") or metadata. Indeed, a lack of metadata standards has been identified as a key technical barrier to public health data sharing 1. Metadata varies c...
Show abstract
BackgroundAlbuminuria is associated with increased risk of cardiovascular disease (CVD), heart failure, and progression of chronic kidney disease (CKD). Early detection of albuminuria, done through spot urine albumin creatinine ratio (UACR) testing, enables more accurate risk stratification and timely use of preventative therapies. It remains unacceptably low in the hypertension population. MethodsWe evaluated two EHR-embedded clinical decision support (CDS) strategies at Geisinger Health Syste...
Show abstract
Large language models (LLMs) have shown incredible promise in medicine. While LLMs may be particularly useful in areas requiring extensive review of clinical records, their use remains limited due to their tendency to hallucinate and fabricate information. Hallucination issues, as well as their consequences, are exacerbated in low-probability, high-stakes scenarios such as rare adverse safety events or medical errors. We present SAFE-AI (Structured and Automated Framework for Explainable AI), a ...
Show abstract
The use of Electronic Health Records (EHRs) has increased significantly in recent years. However, a substantial portion of the clinical data remains in unstructured text formats, especially in the context of radiology. This limits the application of EHRs for automated analysis in oncology research. Pretrained language models have been utilized to extract feature embeddings from these reports for downstream clinical applications, such as treatment response and survival prediction. However, a thor...
Show abstract
Rare diseases affect millions worldwide and are associated with long diagnostic delays, limited access to treatments, and substantial challenges in daily care and coordination. Digital health technologies, including mobile apps, telehealth, and data-sharing platforms, offer opportunities to improve care and quality of life for people living with rare diseases. As these tools rapidly expand, this study examines the needs, expectations, and conditions for successful adoption of patient-centered di...
Show abstract
PurposeNatural Language Processing (NLP) has the potential to extract structured clinical knowledge from unstructured Electronic Health Records (EHRs). However, the limited availability of annotated datasets for algorithm training restricts its application in clinical practice. This study investigates the use of transformer-based NLP models to structure Italian EHRs in cardiac settings, addressing this gap. MethodsWe implemented and evaluated three named entity recognition algorithms: SpaCy, Fl...
Show abstract
Digital therapeutics (DTx) are patient-facing apps designed to support individuals in their daily lives. Therefore, they have the potential to revolutionize healthcare by empowering and engaging patients to become active players in their own care. Despite the increasing adoption of DTx in national healthcare systems, research on their design remains limited. The present study introduces "DiGATax", a taxonomy designed to categorize and analyze DTx, including perspectives on content, intervention ...
Show abstract
BackgroundHeart failure (HF), including heart failure with preserved ejection fraction (HFpEF) and heart failure with reduced ejection fraction (HFrEF), remains a major global health challenge, particularly among aging populations. Timely and accurate prediction of severe adverse outcomes associated with HF is critical for optimizing care, reducing disease burden, and improving outcomes. Although social determinants of health (SDoH) have been recognized as key drivers of HF disparities and assoc...
Show abstract
IntroductionLarge Language Models (LLMs) in healthcare practice and education have been evaluated using medical question-answering (QA) datasets, with excellent performance. However, multiple-choice questions fall short when assessing more complex language interactions. ObjectiveTo evaluate the time invested and validity of medical students responses to clinical questions using ArkangelAI, compared to traditional search methods. MethodsRandomized, double-blind trial with clinical medical stude...
Show abstract
ObjectiveAdverse events (AEs) resulting from medical interventions are significant contributors to patient morbidity, mortality, and healthcare costs. Prediction of these events using electronic health records (EHRs) can facilitate timely clinical interventions. However, effective prediction remains challenging due to severe class imbalance, missing labels, and the complexity of EHR records. Classical machine learning approaches frequently underperform due to insufficient representation of minor...
Show abstract
IMPORTANCEAlthough angiotensin-converting enzyme inhibitors (ACEIs) and angiotensin receptor blockers (ARBs) are recommended for people with chronic kidney disease (CKD), they remain underused. Barriers to adherence, such as adverse effects or patient refusal, are frequently embedded within unstructured clinical narratives and are therefore inaccessible to structured data analytics. Scalable natural language processing (NLP) approaches are needed to identify these barriers and support guideline-...
Show abstract
BackgroundDigital health technologies, including artificial intelligence (AI)-powered tools and virtual reality (VR) interventions, are increasingly being deployed to support caregivers of patients with chronic conditions. However, the factors influencing caregiver acceptance of these technologies remain poorly understood. ObjectiveThis study aimed to develop and validate a structural equation model (SEM) to examine the determinants of digital health technology acceptance among caregivers of pa...
Show abstract
BackgroundExisting information resources about medicines and their indications have limited usefulness for health data analytics. The emerging potential of large language models (LLMs) to generate clinically accurate responses presents a novel opportunity to develop a comprehensive knowledge base of medicines and their clinical indications. MethodUnique medications from the English Prescribing Dataset (EPD) were extracted and included in a fine-tuned prompt pipeline using the GPT-4 and MedCAT L...
Show abstract
While Americans are using herbal dietary supplements (natural products) more than ever, the consumption of natural products with prescription drugs can lead to harmful interactions. Pharmacovigilance of natural products depends on careful expert review and interpretation of a wide variety of evidence. In prior work, we demonstrated the value of knowledge graph (NP-KG) for assisting with natural product safety investigations. However, scaling the NP-KG from 33 natural products to the thousands on...
Show abstract
Point-of-care (POC) blood testing enables rapid, decentralized diagnostics with transformative promise, yet its innovation landscape remains poorly mapped. To this end, we focused on features that we believe are key to make progress in areas of precision healthcare and predictive medicine, such as longitudinal data collection and data analytics integration. While no review can be complete, this work attempts to address this gap by analyzing 86 POC blood testing devices worldwide and proposing a ...
Show abstract
BackgroundThe use of large language models (LLMs) is increasing in the medical field; however, LLMs are often subject to "confabulations." Notably, LLMs have vulnerability to adversarial attacks, or fabricated details within prompts, which is concerning given both health misinformation and inadvertent errors in the medical record. This purpose of this study was to determine the effect of adversarial attacks by embedding one fabricated medication into a list of existing medicines. MethodsA total...
Show abstract
Artificial intelligence (AI) is increasingly integrated into healthcare delivery, yet patient acceptance in resource constrained settings remains incompletely characterized. This study assessed attitudes toward AI supported care among patients attending hospitals in three Jordanian governorates (Amman, Balqa, Irbid) and examined demographic and digital literacy correlates of acceptance. In a cross sectional survey (n = 500 complete questionnaires), participants rated exposure to AI in healthcare...
Show abstract
ObjectiveMuch medical data is only available in unstructured electronic health records (EHR). These data can be obtained through manual (human) extraction or programmatic natural language processing (NLP) methods. We estimate that NLP only becomes economically competitive with manual extraction when there are ~6500 EHRs records. We have found that there is interest from clinicians and researchers in using NLP on projects with fewer records. We examine whether a large language model (LLM) can be ...